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Abstract. In this paper we apply a Bayesian framework to the problem of geodesic curve 
matching. Given a template curve, the geodesic equations provide a mapping from initial conditions 
for the conjugate momentum onto topologically equivalent shapes. Here, we aim to recover the 
well-defined posterior distribution on the initial momentum which gives rise to observed points on 
the target curve; this is achieved by explicitly including a reparameterisation in the formulation. 
Appropriate priors are chosen for the functions which together determine this field and the positions 
of the observation points, the initial momentum po and the reparameterisation vector field v, 
informed by regularity results about the forward model. Having done this, we illustrate how 
Maximum Likelihood Estimators (MLEs) can be used to find regions of high posterior density, 
but also how we can apply recently developed Markov chain Monte Carlo (MCMC) methods on 
G function spaces to characterise the whole of the posterior density. These illustrative examples also 

include scenarios where the posterior distribution is multimodal and irregular, leading us to the 
T-H conclusion that knowledge of a state of global maximal posterior density does not always give us 

the whole picture, and full posterior sampling can give better quantification of likely states and the 
overall uncertainty inherent in the problem. 
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1. Introduction 

Geodesies on shape space induced from diffeomorphisms are proving to be a powerful tool in the 
field of computational anatomy fD, 12]. Not only do they provide a notion of distance between 
topologically equivalent shapes, they also provide a linear characterisation of patches of shape 
space centred on a template image using the initial conditions of the conjugate momentum along 
a geodesic [3]. This permits the application of linear statistical techniques (the adaptation of 
principal component analysis known as principal geodesic analysis, for example [3]) since the 
momentum inhabits a linear cotangent space. Bayesian statistical techniques have been developed 
in this area in recent years, with advances in methods of determining the most likely template for 
a dataset as well as the most likely metric [HI El [7] . There are many different shape spaces: the 
shape of curves, landmarks, images, fibre bundles etc., but there is a unifying paradigm since in 
all cases the underlying velocity field that generates the diffeomorphisms that act on the shapes 
evolves according to the EPDiff equation [8] . 

In this paper, we shall address the following inverse problem: given noisy observations of points 
around a curve (which may have been obtained from a segmentation algorithm or human input), 
what is the initial momentum that generates this curve from a given template? We concentrate 
on the parameterisation-independent problem, in which we treat curves as being equivalent if 
they are related via reparameterisation. This is a challenging problem which has been addressed 
in a few different ways [HI ID], but here we shall make use of the approach of [TT] in which a 
reparameterisation variable is included explicitly, and the commutation of the reparameterisation 
with the geodesic flow equations is exploited. In the inverse problem nomenclature, this allows us 
to define an observation operator that produces point values on the curve, at a cost of introducing 
this extra variable. 

Our inverse problem can be summarised as follows. For a chosen metric, equations ^ 
[T]) together with condition ([s]) describe the motion along the geodesic path in the space of 
unparameterised curves, written in terms of a specific curve parameterisation q{s,t), and a 
conjugate momentum p{s,t), where s is the parameterisation variable around the curve, and t 
is time, i.e. the parameter along the geodesic. The initial momentum p{s,0) := Po{s) completely 
determines the direction of the geodesic, and thus the parameterisation-independent geodesic 
between two parameterised curves q^{s) and g^(s) can be completely described by the initial 
momentum po{s) and a reparameterisation map r]{s) which is identified through a one-dimensional 
vector field i^(s). Given a template curve q^, po completely describes the final curve q^ and u 
completely describes a particular parameterisation of g^. We wish to identify the curve which 
is most consistent with a set of observed data points, and we do this by defining a map Q{po, z/) 
which returns a finite set of points on g^. The inverse problem is to solve y = Q{pq, u) + t] given 
finite data y and observation noise rj. Since the inputs to Q are functions and the output is a 
finite-dimensional vector, the problem is underdetermined. 

We choose to adopt the Bayesian approach in this paper, for two main reasons. Firstly, 
the inverse problem of finding the geodesic fiow map which deforms one shape into another from 
observations of the second shape is ill-posed and underdetermined. Secondly, the observations 
that are made are noisy in nature, and as such a probabilistic setting in which we can quantify the 
uncertainty inherent in the problem is appropriate. As the unknown quantities that we would like 



Bayesian data assimilation in shape registration 



3 



to recover are functions, we must take extra care to ensure that the problem is formulated non- 
parametrically|12]. As such, we look to follow the methodologies that have been utilised previously 
in various fluid mechanics problems [131 [IH US] ■ 

Once the Bayesian inverse problem has been defined in such a way as to be well-posed on 
function space, we must consider how appropriate algorithms can be implemented on a computer 
in order to characterise the posterior probability distributions on the unknown functions. The 
unknown functions are of course discretised within the algorithms, and because it may be desirable 
to refine the mesh to obtain further accuracy, it is important that the MCMC methods used to 
sample the posterior converge in distribution at rates independent of the discretisation used. A 
range of MCMC methods which are well-defined on function space (and as such have mixing rates 
which are independent of discretisation of those functions) are available |16]. and have been used 
for Bayesian inverse problems on function spaces 115]. The properties of these algorithms make 
them a highly viable candidate for the analysis of the problems identified in this paper. Specifically, 
a function space analogue of the random walk Metropolis-Hastings (RWMH) algorithm will be used 
in the numerical examples in this paper, referred to in [121 [15] as the pCN algorithm. 

In Section [2] we introduce the problem, and describe the equations of motion of the shape as it 
is deformed by the velocity field, and demonstrate how we can find geodesic paths in shape space 
between two shapes. We also show in this section that the deformed shape is Lipschitz continuous 
with respect to two functions, the initial momentum pq, and the reparameterisation function u. In 
order to tackle an inverse problem on function spaces, it is necessary to show that this problem 
is well-posed. This amounts to finding function spaces on which the forward problem is locally 
Lipschitz, and in Section[3]we find sufficient conditions on the regularity of these spaces. In Section 
|4] we frame the inverse problem as a well-posed Bayesian inverse problem, and identify minimum 
regularity priors for the functions, informed by the analysis in Section |3} 

In Section |5] we describe the Markov chain Monte Carlo (MCMC) algorithm that we use to 
sample directly from the well-defined posterior density on po and z/. In Section [6] we briefly discuss 
how we numerically approximate the dynamics described in Section |2| before presenting some 
illustrative numerical examples in Section [7| We finish in Section [8] with some conclusions and 
discussion. 

2. Description of the forward model 

In this section, we review the equations of motion for curves in the plane acted on by geodesies 
in the diffeomorphism group, and explain how these equations can be used to provide a mapping 
from the space of periodic scalar functions (which turn out to be the normal component of a 
conjugate momentum variable) to topology-preserving nonlinear deformations of some chosen curve 
in the plane. We then provide some analytical results that are required for defining the associated 
Bayesian inverse problem. 

In our approach, we parameterise an oriented curve in the plane as a continuous function q 
from a space S (such as the circle, S^) into i.e., q G C°(S'^, M^). Although in this paper we 
concentrate on the lower dimensional case of a curve in the plane, it may be extended to surfaces 
in M^, which provides many applications in medical imaging, for example. We parameterise an 
evolving curve as g(s, t), where s E S is the parameter around the curve, and t G [0, 1] is the time 
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parameter. Following the methodology of [TBI [TO] we constrain the motion of the curve q{s, t) 
to the action of diffeomorphisms by requiring that 

^^q{s,t)=u{q{s,t),t) (1) 

where u{x,t) is a time-parameter ised family of vector fields on M^. This guarantees that the 
topology of the curve is preserved {i.e. there are no overlaps or cavitations). We wish to study curve 
evolution from a template curve (parameterised by q^{s)) to a target curve (parameterised 
by q'^{s)). However, since we only want information about the shape of the curve, and not the 
specific parameterisation, we do not wish to enforce that any specific point q^{s) gets mapped to 
any specific point on F^. We instead use the following generahsed boundary condition 

q{s,0) = q\v{s)), q{s,l)=q\s), (2) 

for arbitrary reparameterisations rj G Diff+(5'), the orientation-preserving subgroup of the 
diffeomorphism group Diff(S') on S. If the boundary conditions ^ are satisfied, we say that 
u describes a path between F^ and F^. We select a function space B for vector fields, and define 
the distance along the path as 
1 1 

^Mldt. (3) 

For simplicity we assume that B is a Hilbert space and that there exists an operator A such that 

\\u\\1 = {u,Au)l^. (4) 

The shortest path between F^ and F^ is defined by minimising ([s]) over u and rj subject to Q and 
the boundary conditions ([2]). To obtain the equations of motion for the shortest path, we introduce 
Lagrange multipliers p{s,t) (which we call the "momentum") to enforce ([T|, and seek extrema of 
the action 

1 

J 2^Mb + {P^'i-u{q))dt. 

This leads to the following equations of motion in weak form: 

{6u,Au)l2- {p,6u{q)) =0, (5) 

5p,|f-«(g))dt =0, (6) 

^'(p,^-(V^g))5g^dt = 0, (7) 

where 6p and 6q are space-time test functions, with 

u,SuGB, p,SpGL'^, q,SqEH^. 
If the minimisation is taken over all reparameterisations rj G Diff4.(S') then we obtain the condition 

p-^ = 0, WG[0,1]. (8) 

The condition states that the momentum p is normal to the shape. As described in [Hj, if this 
condition is satisfied, then the curve evolution is invariant under the transformation 

{p,q) ^ {p,q) = (pori^,qor]\ (9) 
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for rj G Diff_|_(S'), which is the cotangent-hft of the transformation q q o rj. This means that 
if equations (|5]j7]) are solved with initial conditions {p, q) and (p, q) then q = qo rj ai time 1. As 
described in [12] , solutions that satisfy equation [s] are parameterised realisations of geodesies on 
the shape space M^)/ Diff+(^). 

We define the time-1 flow map \l/ 

*(p|t=o,g|t=o) = q\t=i, 

where p and q satisfy equations (|5]j7|. Having fixed q\t=o = q^, we define a parameterisation- 
independent mapping between scalar functions Pq : S ^ M. (the normal component of the initial 
conditions of p) and topologically equivalent curves q^ obtained by 

q^ = '^{pon,q^), 

where n is the normal vector to the curve q^. The power of this mapping is that it allows us to 
perform linear operations on the space of functions po, such as averages. 

In this paper, our aim is to estimate the probability measure for po given a set of observed 
points (gi, . . . ,qn) from the curve g^. These points may have been obtained from a digitised medical 
image, for example. These points are assumed to be sorted in order according to the orientation 
of the curve. We cannot directly solve the inverse problem of finding pq such that 

*(Po^,g^)(si) = gi, i = l,...,n, 

since we do not know the values Sj of the curve parameter s that correspond to each g^. To fix 
this, we introduce a reparameterisation variable rj G Diff_|_(S'^), and seek {po,ri) such that 

^ (^{Pon) °V^,(1^ (si) = qi, i = l,...,n, 

with {si}^^i is some chosen distinct sequence of points in S ordering according to orientation, 
e.g. equispaced points. This guarantees to preserve the ordering since the curve is evolved by a 
diffeomorphism. The introduction of the reparameterisation variable does not alter the shape of 
the obtained curve, just the particular parameterisation used. 

Following [20], we construct reparameterisations from scalar periodic functions u by solving 

^{s,t) = u{xis,t)), x{s,0) = ld, x(0,t) = x(l,t),0<t, (10) 

where Id is the identity map (Id(s) = s). The reparameterisation map is then evaluated from 
rj{s) = x{s, 1). The function u represents the generating vector field for the reparameterisation rj on 
the scalar interval [0, 1] with periodic boundary conditions. This is known as Lie exponentiatiorl^ 
and guarantees that r] is an orientation preserving, smooth invertible map (for sufficiently smooth 
i>). This defines a reparameterisation map: 

(pon) or]—,q^or] 

where t] is obtained from (10). An illustrative diagram is given in Figure [l] 

I As described in |21J, not all reparameterisations of the circle can be obtained from Lie exponentiation, and a 
more general approach would be to generate reparameterisations from geodesies in a way that is similar to how the 
deformations of the curves are obtained (known as Riemann exponentiation). 
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Figure 1. Diagram illustrating the forward model. Curves are shown with highlighted points to 
indicate the parameterisation. The reparameterisation velocity v generates a reparameterisation 
of the curve (which also transforms the initial momentum pq), and then the (transformed) initial 
momentum po generates an evolution of the curve by the time-1 flow map of equations (5][7|. For 
the undiscretised equations changing v but not po leads to different parameterisations of the same 
curve, since reparameterisation and the time-1 flow map commute. 



Hence we define the observation operator Q by 



(11) 



V Gisn) J 

where G = \1/ o TZ[pQn,q^,i'). The normal component of Pq then characterises the shape of the 
target curve relative to the curve F^, whilst the generator variable u merely describes the 
reparameterisation of the target curve which is obtained at the minimum. 



3. Properties of the Observation Operator 



In this section, we show that the observation operator Q, given in Equation (11 ), is Lipschitz in po 
and z/, for appropriately chosen spaces. We make the following definition. 

Definition 1. We say that a Hilbert space B of vector fields is embedded in C"'{Q,M.'^) if there 
exists a constant Ce s.t. ||f ||n,oo < C'ell'^Hs- In that case we also say for notational convenience 
that B is n— admissible. 



We adopt the notation that u G D. The main result of this section is the following Theorem. 
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Theorem 2. (i) Let B he 2— admissible, and D he 1 — admissible. Then for go ^ W'^'°° , the 
observation operator Q is a Lipschitz continuous map from {L'^,D) to M^*^. 

(a) Let B be 3— admissible and D be 1— admissible. Then for po G H^^ and go ^ W'^'°° , the 
observation operator Q is Lipschitz continuous. 

To show that Q is Lipschitz, we first prove existence, uniqueness and Lipschitz continuity for 
the maps \l/ and TZ. This is then used to show that the observation operator is Lipschitz continuous 
with respect to the normal component po of the initial conditions p and the generator variable v. 

Let B be the space of vector fields, Q be a Hilbert space and P be its dual. Here Q represents 
a space of curves with a particular norm, for example Q = L'^{Si, M) or Q = H^{Si, R). We want 
to define the momentum map as given by 

J : {p,q) e P X Q ^ J{p, q) := (^v ^ {p,vo g)^^) G B* . (12) 

Here B* represents the dual space of B, the space of linear operators acting on B. In this aim, we 
assume in addition that the map B x Q Q defined by (f , g) — > f o g is a Lipschitz map on Q for 
a fixed V E B namely there exists Cc s.t. 

\\voqi-vo gsllg < Cc||^;||B||gi - g2||Q • (13) 

As a consequence, since this composition map is linear on B, the map is Lipschitz on B x Q. This 
assumption makes J{p, g) well-defined and Lipschitz on P x Q for the dual norm on B*. 

Lemma 3. The assumption above is satisfied in the following situations 

• B is 1 — admissible and P = Q = L^(5'i,]R). 

• B IS 2-admissible, P = H-\Si,R) and Q = H\Si,R). 

Proof. For both points, we use the integration formula for the path g(t) = tgi — (1 — t)go 

/(g(l))-/(g(0))= f df{q{t)){qi-q,)dt, (14) 

for / G C^. For the first point, for any / G Ci'°°(M'^, M'^), 

11/ o gi - / o g2||i2 < ll/lli.oollgi - g2||L2 , (15) 

which gives the result. For the second point, we note that iJ^(S'i,]R) is Banach algebra, namely 
for any couple (/,(?) G if^(S'i,]R) we have 

WfgWm < ll/lloolklli/i + ll^7llooll/bi < 4||/||^i||^||^i . 



Therefore in the Formula ( p3| , the term df{q{t)){qi — go) can be understood as the product 

(generalised to M"^) of two elements in H^{Si,M.'^) provided that df{q{t)) belongs to . Moreover 
is stable under the composition with a function bounded for 

\\foq\\m = \\fh,oo\\q\\H^ (16) 

The second point then follows from 

11/ o gi - / o gsll^i < ||/||2,oo||gi - q2\\m , (17) 

□ 
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We write Equations ([SjjTj) as follows: 

q = uo q 

p = — Vu{q).p 

u = J{p,qf, 

where J{p, g)" denotes the corresponding element in B given by the Riesz theorem, i.e. 

{J{p,q)\v)B = J{p,q){v) . 



(18) 
(19) 
(20) 



Equation (19) is a pointwise matrix multiplication when p is smooth. When p E H for instance, 



the operation is defined by duality, defined as follows. 

Definition 4. Let Q be a Banach algebra then the multiplication Q* x Q Q is defined by duality 

{<li-P,(l2) = {p,qiq2) (21) 

Remark 5. The multiplication on a Banach algebra is by definition bilinear continuous. As 
a consequence, the multiplication inherits the same properties and in particular, the Lipschitz 
property. 

We now prove that these shooting equations are a simple ODE in infinite dimensions i.e. 
X = F{X) with state variable X in a Banach space; This function F satisfies the classical Lipschitz 
assumption which means that solutions can be obtained using Picard's fixed point method; it also 
ensures that the solutions are Lipschitz functions of the initial conditions. 



Lemma 6. System (18 20) is an ODE on P x Q which satisfies the Lipschitz condition for the 
two following cases on bounded balls in P x Q: 

• B is 2— admissible, P = and Q = L°° 

• B is admissible, P = H^^ and Q = . 

Remark 7. The space Q = L'^{Si) is not a Hilbert space in this case but only a Banach space. 



Proof. In the two claimed cases, we have by Proposition ^ that {p,q) — )■ J{p,qr is Lipschitz for 



the norm on B. Then using the inequality (15), we deduce that {p,q) — > J{p,qy o g is Lipschitz 



as the composition of two Lipschitz functions. Thus, we have treated the first component (18) of 
the ODE. 



Using Remark |5| it is then sufficient to prove that {p,q) E P x Q d[J{p,qy]{q) E Q is a 
Lipschitz mapping. Denoting Vi = J{pi, qi)^ for i = 1,2, use of the triangle inequality gives 

\\dvi oqi- dv2 o g2||Q < C'e||^^i||B|ki - q2\\Q + \\dvi o - dv2 o gallg 

< Cell 171 II B II gi - g2||Q + Cell^l - ^^2 || B || g2 || g , (22) 



where we rely on the inequality (17) for the case and (15) for the first case. 



Up to this point, we only have proven a local Lipschitz condition. It remains to prove that for 
(PO) qo) £ B(0, r) d P xQ with r a positive real number, there exists a constant ^ (the subindex 
indicating that this constant only depends on r and the time t) such that for all time t > the 



system (18 20) is Lipschitz of Lipschitz constant M^. In particular this implies that the solutions 



are defined for all times. To this end, we remark that the geodesic equations (18 20 ) are variational 
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with the Hamihonian is constant in time. This can be checked by a straightforward calculation. 
Since the Hamiltonian reads H{p,q) = \\J{p,qY 



\dvi oqi- dv2 o galle < 3r Ce ||gi 



1^, the Lipschitz inequality (22) becomes 
-Q2\\q. (23) 



For the equation (18), the inequality (13) implies 

||fi oqi-v20 gall < rCc \\qi - g2||Q + ||^^i - 1^2 Us max (||gi | 



but by direct integration of Inequality (13) we see that max(||gi|L , ||g2 



lk2||Q) (24) 
< CcVt for any time 



t > 0. A similar inequality holds for the momentum p, hence the geodesic equations (18 20) are 
Lipschitz for a constant M^t that has polynomial growth in t and in r. 



□ 



We now claim the following result which is based on Gronwall's lemma. 



Lemma 8. Solutions of the geodesic equations (18 20) have a Lipschitz dependency w.r.t. their 
initial conditions for their respective norms. 

We are now in a position to prove Theorem [2j 
Proof. The observation operator Q is the evaluation at points (?7(sj))"^^ of the deformed template 



g(l) obtained by the geodesic equations (18 20) without reparameterisation. In other words, 

/ g(l)or/(si) \ 



Ev{po, go, ly) 



(25) 



V q{l)or]{sn) J 



where g(l) is the solution at time 1 of the geodesic equations (18 20) for initial data po,qo and 
r] is the Lie exponential of u. To prove that the observation operator is Lipschitz in those three 



variables, we observe that it is the composition of the flow of the geodesic equations (18 20) and the 



right composition with the reparameterisation by rj. The first operation is Lipschitz continuous by 
Corollary d8|. Now the right composition is also Lipschitz continuous since g(l) G (indeed 



since B is 2— admissible or 3— admissible, the flow of the geodesic equations (18 20) preserves 
W^'°°) we have 

||g(l)or/i-g(l)or72||oo < || Vg(l) ||oo ||r/i - t/2||oo < M||Vg(l)||oo||^^i - z^2||d • (26) 

□ 



4. Bayesian Inversion 

In this section, we aim to frame the ill-posed inverse problem of finding {po, v) given noisy 
observations of the target shape, as a well-posed Bayesian inverse problem on function space. 
The "solution" to this inverse problem is then given by a probability distribution. Bayes' formula 
allows us to construct, through combining prior beliefs about the functions, with observations of 
the system, the posterior distribution of the possible values that the unknown functions could have 
taken. 
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4.1. Prior Distributions 

Extra care must be taken in this context, since the unknowns are functions, and as such the 
problem must be formulated non-parametrically. Following the philosophy of [I2], we use the 
results from Section |3] to identify minimum regularity priors which will ensure that the problem is 
well-defined. For problems on infinite dimensional spaces, we must identify prior distributions for 
the functions, which have full measure with respect to function spaces on which the observation 
operator is Lipschitz continuous. Theorem [9] summarises these results for the problem presented 
in this paper, and follows at the end of this section. 

The role of the prior in this respect is two-fold. The prior can contain prior information or 
beliefs about the form or structure of the solution, possibly taken from previous observations. 
However another important role of the prior is to regularise the problem, to make it well-posed. 
Indeed it plays a very similar role to the penalty term in a Tikhonov regularisation, an optimisation 



formulation of the inverse problem. We will explore these similarities in Section 5J^, and exploit 
them in our numerical method. 



4-2. Observations and the Likelihood 

Uncertainty in this problem arises from errors incurred during observation. Observational noise 
is often modelled as being additive Gaussian. This uncertainty in the exact shape of the curve 
leads to uncertainty in the value of the pair of functions which would deform the reference shape 
to the observed shape. Understanding and quantifying that uncertainty is a key part of solving 
this problem, as naive least squares matching to the data would lead to an ill-posed problem. 
We assume that observations y of the quantity of interest are noisy in nature, satisfying: 

y = g{p,,u) + ^, e~Ar(o,s), 

where S is assumed to be known, and where Q is defined to be the observation operator given 



by (11). That is, the function which returns to us what noiseless observations we would make if 
we were to deform the template shape and reparameterise using the function pair {po,!/). This 
assumption allows us to compute the likelihood that y was observed with a given [po, u), through 
the probability density function of ^: 

P(?/|po,^^) ocexp (^-l||^(po,zy) , (27) 

where := x'^T,~^x is the covariance weighted norm. 



4-3. Bayes' Theorem and the Posterior 

Bayesian inversion is one way to regularise an ill-posed inverse problem, and convert this into a 
well-posed one. Bayes' formula is central to this. Prior beliefs about the quantities of interest 
are blended with data from observations to give the posterior F{u\y) distribution. The finite 
dimensional version of this formula is 

F{u\y) oc F{y\u)F{u), 

where u denotes the unknown quantity that we wish to characterise from the data, P(-u) 
the probability distribution describing our prior beliefs about those quantities, and f{y\u) the 
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likelihood function, which returns the likelihood that we would make the observations y given a 
particular value of the quantity u. 

However, in this paper the quantity of interest is a pair of functions u = [po, v). In this infinite 
dimensional setting, we must use the analogous result regarding the Radon-Nikodym derivative of 
the posterior distribution with measure /i, with respect to the prior distribution on (po, ^) with 
measure ixq: 

dn{po, v) 



ocP(i/|(po,^^)), (28) 



where the likelihood expression is given by (27) 



Note that this formula only holds if the posterior is indeed absolutely continuous with respect 
to the prior distribution, otherwise no such derivative is admitted. Sufficient conditions on the 
priors for this to be satisfied are given in the following section. 

4.4- Prior and Posterior Distributions on {pq, v) 

Before stating the main result of this Section, we must first introduce some notation. Let us 
consider the Helmholtz operator with periodic boundary conditions 

where £ G M defines a characteristic length scale. Note also that for samples u ~ A/'(0,5'H~") for 
some a > (3 + ^ for some /3 > 0, then u G almost surely (follows from [12] )• The length scale 
parameter i G allows us to control at which scale the smoothing properties of the Laplacian take 
effect. With a larger value of £, a larger value of \k\ is required before the effect of the Laplacian 
becomes dominant. Likewise, as £ — > 0, — ?■ —A, meaning that the Laplacian is dominant on all 
scales. The choice of this value, however, does not affect the overall regularity of samples drawn 
from the distribution Af{0,'H~°'). 

Theorem 9. Let B he 3-admissable, D he 1-admissahle, and fioiPo:^) = A/'(0, ^i?/^"^) x 
XiO, 62^-2"^) for ai > -\, (X2 > 3/2, 61,62 ^ 0, and where Hi = (ij - A) where G M+ 
fori G {1,2}. Then Q is measurahle with respect to /io, and the posterior measure ^ is absolutely 



continuous with respect to /io, with Radon-Nikodym derivative given by (28). 



Proof. Result follows by the Sobolev embedding theorem, results in [12], and Theorem [2] □ 

For the posterior distribution to be well-defined, it must be absolutely continuous with respect 
to the choice of prior distribution on the two unknown functions of interest. This choice of prior 
distribution is informed by the properties of the forward model, and the observation operator Q. 
Theorem |9] gives us Gaussian priors which have sufficient regularity to ensure that this condition 
is met. 

5. Characterising the Posterior Density 



In this section we introduce the statistical methods which we use in order to characterise the 
posterior distribution of the function pair (po, ^) given a choice of prior distribution and a set of 
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noisy observations of the target shape. Monte Carlo Markov chains (MCMC) are a set of tools 
which allow us to sample from a desired probability distribution. A chain of states is formed, 
which converges in distribution to the target density. We then describe a method for initialising 
the MCMC method efficiently. 



5.1. The Function Space Random Walk Metropolis- Hastings algorithm (pCN) 

For the purpose of the numerics, we use a version of the Random Walk Metropolis Hastings 
(RWMH) algorithm framed on function spaces, referred to as the pCN algorithm[T2l [16]. This 
method is superior to the usual vanilla random walk method in that it is well-defined on function 
spaces, and as such is robust under refinement of any discretisation of the forward model [16]. 
This means that the rate of convergence of statistics in the Markov chains are independent of 
this discretisation, whereas the Metropolis Hasting algorithm with vanilla random walk proposal 
degenerates as the forward model's discretisation is refined. The following describes the pCN 
sampler, where $ = ^||^(po; ^) ~ vWh- 

1: Mo = (po, 

2: for i = 1 : do 

3: Sample f = (1 — + (3w, where w ~ /io((Po; ^)) 

4: a(uj_i,t>) = min {1, exp($(Mj_i) — ^{v)} 
5: Sample u ~ f/([0,l]) 
6: if M < a{ui-i,v) then 

7: Ui = V 

8: else 

9: Ui = Ui-i 

10: end if 
11: end for 

The acceptance probability in this algorithm is independent of the approximation method used 
for the unknown functions, and as such has superior mixing properties as compared with standard 
methods for finer discretisations. Further to this basic algorithm, we can alter this to also use 
the data to infer on the variance of the observational noise, if we no longer assume that this is 



known [I6]: we make use of this in the numerical experiments in Sections 7.1 and 7.3 



MCMC methods usually need a period of "burn-in" in which we try to ensure that the chain 
starts in a region which is not in the tails of the posterior distribution. MAP estimators can be 
used to try to find regions of high posterior density, as described in [13] . For the problem as we 
have defined it, this equates to finding the solution to: 

min L{po,u) = min h\G{po,i^) - y\\l + l\\iPo,i^)\\lo 

where ||(-,-)IUo the equivalent penalty term to that induced in the posterior measure by the 
choice of prior measure /iq, or the Cameron- Martin norm corresponding to /io- In particular, if (as 
is the case for our purposes) /io(po) ^) = -^(0, ^i?/^"^) x Af{0, 62'H~°'^), then 
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Various descent methods can be used to try to find local minima of this quantity. Most of 
these methods incorporate gradient information of some type to attempt to search for the minima 
in appropriate directions. These methods include steepest descent and conjugate gradient among 
others. 

The method that we will utilise in our numerics in order to initialise our Markov chains is 
called the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method |22l |23]. This method uses the 
gradient information from the last two states in the chain to approximate a Hessian matrix, which 
is then used to choose an appropriate direction in which to search for the local minimum. 

Once the BFGS method has converged within certain tolerances, the state that has been 
settled on can be used as an initial state in the MCMC chain. Assuming that the BFGS method 
has converged close to a global or even local maxima of probability density, then the chain will 



more quickly enter probabilistic equilibrium. For example, consider the data used in Section 7.3 



The ratio of probability densities of the state {po, v) = (0, 0) with the state the BFGS method 
converged to was exp(— 958.85), a huge difference. 



6. Numerical Approximation of Q 

The observation operator consists of the concatenation of four maps: 1) the generation of the 
reparameterisation variable r] from the vector field u according to ( [Io| ), 2) the application of the 
reparameterisation formula according to ([9]), 3) the time-1 flow map of the geodesic equations 
according to ([5]j7]), and 4) the evaluation of the curve q\t=i at the finitely chosen points. Stages 
(1) and (3) are both numerically discretised using particle-mesh methods as described in [2^ . 
These methods allow efficient evaluation of the velocity field at particle locations whilst preserving 
variational structure and making it possible to compute the exact adjoint equations of the discrete 
system which are used in the implementation of the deterministic burn-in. Stages (2) and (4) 
use cubic B-spline interpolation, which also make computing the adjoint tractable. Details of 
the numerical approximation procedure, along with numerical tests that show that the numerical 
scheme is second-order convergent in space, are given in |20]. In summary, the discretisation 
replaces the parameterised curve q{s) as an ordered discrete set of points {qi}^Zi, similarly the 
conjugate momentum p(s) is replaced by a set Pi^Zi of (co)-vectors associated with each point. 
Convergence is achieved as rip — )■ oo, assuming a sufficiently smooth velocity field; the convergence 
rate was demonstrated to be second-order in [20]. The velocity field u is discretised on a static, 
uniform mesh with rig x rig grid points. In the numerical algorithm, the momentum is interpolated 
from the points to the grid using cubic B-splines; the elliptic operator A is then approximately 
inverted on the grid using discrete Fourier transform, and the resulting velocity is interpolated 
back to the points g^. As described in [21], this interpolation preserves the Hamiltonian structure 
of the equations, and as described in [20], allows an exact adjoint of the numerical model to be 
constructed, which is used to obtain derivatives of the observation operator. Convergence was also 
demonstrated to be second-order as rig ^ oo in pO]. Alternatively, one can select a finite value 
for rig whilst keeping q and p continuous; this just leads to a modification of the operator A. The 
discretisation for the reparameterisation variable u was introduced in |20j. The reparameterisation 
velocity z/(s) is also discretised as a discrete set of values reparameterisation map 

is obtained by interpolating z/ to the reparameterised locations; q and p are then pulled back under 



Bayesian data assimilation in shape registration 



14 



the reparameterisation map by interpolation. As discussed in [11], reparameterisation and flow 
along geodesies are commuting operations; in [20] the commutation error was shown to converge 
to zero at second-order as rip — )• oo. 

7. Numerical Results 

We present numerical results to show that the algorithm is successfully drawing samples from the 
posterior distribution, and then go on to study the posterior distribution in certain data scenarios. 
The first thing to address is our choice of template shape and the parameterisation q^{s) that we 
use for this shape. Since we are only considering closed curves, it seems natural to choose a circle 
for this to keep things as simple as possible. We also wish to choose a nice smooth parameterisation 
for this shape, centred in the middle of our domain = [0, 27r)^ so we pick 

= (cos(s) + 7r,sin(s) + 7r), sG[0,27r). (29) 

We will engage in simulation studies in which the data is itself produced by employing the numerical 
simulation of a forward PDE model. In all the numerics, we assume that the noise ^ through which 
we make the observations 

y = g{po,u) + ^, e~Ar(o,s), 

has a diagonal covariance matrix S, which we describe for each experiment. 

In terms of the approximation of the forward model in the algorithm, 50 time steps are used in 
which to deform the parameterisation of the shape from time t = to t = 1. The curves themselves 
are approximated by 100 points, and with a 64 x 64 grid approximating the underlying velocity 

field. The values of Sj at which observations are made are given by | | . These parameters 

are used in the model to create the data, and in the implementation of the statistical algorithm. 

The prior distributions on pq and u are M{0,6i'H~°'^) and Af{0,6i'H~°'^) respectively, with 
«! = 0.55, a2 = 1.55, 6i = 30 and 62 = 0.05. Note that these parameters are sufficient to ensure 
that Corollary [9] holds. 

In the following sections we plot normalised histograms for particular Fourier modes of the 
two functions of interest, from converged Markov chains which are invariant with respect to the 
posterior distribution as described in Section [5j 

7.1. Posterior Consistency 

It seems reasonable to expect that as we increase the amount of informative data that we are using 
in our inference, the closer our posterior mean will be to the functions that created the data, and 
that at the same time the uncertainty in that estimation will decrease. Posterior consistency is 
not yet a developed topic in inverse problems, with most theory developed only in the Gaussian 
(and often linear) cases [251 ISHl [27]. As such we do not aim to prove posterior consistency in this 
context, but instead aim to present results which appear to indicate that this is indeed the case 
in this application. In this set of numerical experiments, we take a draw [po, v) from the prior 
measure, and using our approximation of the forward model, create data y such that 
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with increasing N. 

In this experiment, the data is simulated using the numerical approximation of the forward 
model, but in order to avoid committing an inverse crime[2B] we use a much higher resolution 
of the curve when simulating the data. 1000 points were used to describe the curve in the data 
creation routines, while just 100 points were used within the approximation used for the MCMC 
algorithm. The initial momentum and reparameterisation functions that created the data were 
drawn from the prior distributions. Mean- zero Gaussian noise with covariance matrix S = a^/ 
with a = 10~^ was added, but was not assumed to be known. In the results that follow, the 



MCMC method described in Section 5A_ was used to sample from the joint distribution on the 
functions of interest (po, and the observation noise variance a^. 



TVue P 
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(a) 



(b) 



Figure 2. Plots of the mean values (solid lines) of (a) po and v occurring in the posterior 
distribution using data with increasing number of points, with observational noise covariance equal 
to a I with a — 0.01. The dashed lines denote ±3 standard deviations (approximately 99.7% 
confidence interval) 



Figure [2] shows plots representing the mean and variance of the approximation of the functions 
Po (left) and u (right) given an increasing number of observations. As the number of observations 
is increased, the better we recover the actual value of z/ that was used to create the data, and the 
level of uncertainty in the distribution is reduced. The convergence of Pq is not so clear, as the 
convergence is not expected to be pointwise in this case, but in L'^{S^). 

However, we can look at the deformed shapes that are produced by the mean initial momenta 
do shown in Figure [2] These are shown in Figure [s] (left). The marginal distributions on the 
observational noise variance as estimated by the MCMC methods are plotted in Figure [s] (right). 
These clearly show that as the number of observations increases, the distribution is becoming 
increasingly peaked around the value that was used when creating the data. 



7.2. Effect of lengths cale on multimodality 

One might ask what the advantage of full posterior sampling is over other less computationally 
expensive optimisation approaches. In this section we show that certain data scenarios can cause 
the posterior distribution on the initial momentum po to be complex with many local maxima of 
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Figure 3. (a) Plots of the deformed shapes using the mean values of Pq from Figure^ (h) Marginal 
distribution of observational noise variance in the MCMC chains. 



probability density. In this case the solution from an optimisation approach may not be unique, 
depending on the initial state of the solver. Being able to characterise the whole of the distribution 
in this case allows us to identify several different possibilities for the function pq, and to get a 
better idea of the uncertainty in the problem. One such data scenario that exhibits this posterior 
multimodality is where we have features in the data whose lengthscale is smaller than the filtering 
lengthscale a in the metric for the diffeomorphism, as we demonstrate in the following example. 
Let us once again set the initial shape to be that defined by the parameterisation p^{s) given 



in (29). We then define F (r) for r G [0, 1], such that it consists of a square of length 2, centred at 



[tt, tt], with quarter-circles of radius r continuously added on each corner. 




Figure 4. Plot of the target shape with varying r, the radius of the quarter circles on the corners 
of the square. 



Figure |4] shows the target shape for a range of radii r. Note that for r = 0, the shape is exactly 
a square of length 2, and for r = 1 the shape is a circle of radius 2, exactly overlapping the template 
shape (29 ). The variable r defines the minimum lengthscale of features in the data. Once r becomes 
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smaller than the maximum lengthscale than can be resolved by the model, we would expect to see 
multimodality creeping into the posterior distribution on pq. Since the template shape cannot be 
bent to imitate the features on such a small scale, near to this feature only some of the observation 
points can be well-matched for any given po- This problem does not cause multimodality in the 
posterior distribution on the reparameterisation function u since the observation points on the 
template shape do not need to move in order to minimise the distance between themselves and 
the observations for any of the possible Pq configurations in the multimodal posterior on the initial 
momentum. 

In the following experiments we look at what happens to the posterior distribution on {po, v) as 
r starts at 1 and steadily decreases. For each value of r, we consider the shape r^(r) as described 
above, and place 10^ observation points along the shape, evenly spaced in terms of arc length. 
A large number of observations are used so that the data can resolve small lengthscales. We use 
these observation points as our data, with no noise added, to isolate the cause of the multimodality 
purely to the lengthscale of the features in the data. For the function $ = |||^(j9o, ~ l/lll 
likelihood functional, we pick S = cr^J with a = 10~^. 




Figure 5. Plot of marginal distributions on (a) the lowest, (b) the second lowest wave number 
Fourier mode of the initial momentum function po, given varying value of r, the radius of the 
quarter circles in the target shape. 



Figure |5] shows the marginal distributions on the first two Fourier modes of the initial 
momentum po as the radius of the quarter circles in the data is reduced. Notice that for r = 1, where 
the data is simply a circle, the distributions are smooth, monomodal and bell-shaped. However, 
as r is reduced, the distributions become increasingly irregular, and in some cases multimodal. 
Moreover, these later examples take a great deal of time to converge as the posterior distribution 
is more complex, and it is harder to be sure that we have adequately explored the whole of the 
probability measure. However, these results show that the posterior measures arising from this 
application are non-Gaussian, and not necessarily monomodal. Simply finding the local maxima 
of probability density for these examples would not be a good description of the nature of the 
distribution as a whole, not would it help us in determining the uncertainty in the problem. 

Similarly, Figure [6] shows the marginal distributions on the first two Fourier modes of the 
reparameterisation function v as the radius of the quarter circles in the data is reduced. Note that 
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(a) (b) 

Figure 6. Plot of marginal distributions on (a) the lowest, (b) the second lowest wave number 

Fourier mode of the reparameterisation function v, given varying value of r, the radius of the 
quarter circles in the target shape. 



the distributions on z/ are monomodal in nature, in contrast to those on p in this data scenario. 
7.3. Partial observations 

Another area of interest to many practitioners is algorithms which try to recover information in 
the case where our observations of the target shape are only partial. We may only observe part 
of the shape, or our observations in some regions of the shape may be essentially destroyed by 
excessive noise. In this case we would expect to find many different possible shapes which could 
fit with our data equally well, meaning that the uncertainty quantification through full sampling 
of the Bayesian posterior is informative. 

Let us consider the scenario where we have 1000 fairly evenly spaced observations of the target 



shape, where the template shape is given by (29) and where the functions (po) ^) that create the 



target shape and the positions of the observation points are draws from the prior. Suppose now 
that the observation noise is no longer i.i.d. around the entire shape, but instead we have two 
regions with different covariance structure. In one region, covering approximately three quarters 
of the shape's circumference, the noise is relatively small, distributed as A/^(0, V) with V = ajjl 
with CT/) = 0.0001. In the final quarter of the domain we add noise with much larger variance, with 
ai = 0.1. This value is so large that most of this data is rendered useless, in terms of extrapolating 



information about the value of the functions po and z/. As in Section 7.1, a very fine resolution of 
the curve was used when simulating the data (1000 points), and a coarser representation was used 
within the MCMC method (100 points). 

In this data scenario, we must also be careful in our choice of functional ^{po, ^) = 
1^) ^IItpo likelihood. If we were to set Fpo — (jlj^ over the whole domain, then 

we would be attempting to fit the shape very closely to the very noisy observations in the final 
third, which would cause all manner of problems. On the other hand, if we were to take Fpo = ct|^, 
we would be assuming a lot more uncertainty in the majority of the observations than is actually 



the case. In the numerics that follow, as we did in Section |7.1[ we treat the observation noise 
variance o"^ as an unknown also to be found. 
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Figure 7. Plot of partial data, due to large noise variance in one region of the observed shape. 
Green line denotes the converged solution of the corresponding optimisation problem. 



Figure [7] shows an example of such a data scenario. The stars denote data points, and the 
green curve denotes the converged solution of the corresponding optimisation problem as described 



in Section 5A_ However, this single curve does not tell us the whole story, since there are many 
curves that could satisfy the data (almost) as well as this curve. 





(a) 



(b) 



Figure 8. Plot of 20 shapes arising from samples from the posterior distribution using the data 
plotted m alongside (a) the data points, (b) the true shape (black curve). Since the recovered 
noise model assumes uniform observational noise over the whole shape, there is higher variability 
over the whole shape due to the extra noise in 1/4 of the data. 



Figure |8] represents this variability through the plots of 20 shapes arising from samples of the 
posterior distribution using this data. The data is very poor in one region, and as such there are 
several shapes which match the data equally as well. The data is so poor in this one region that 
the recovery of the true shape in this area is poor. 

This variability can also be seen in Figure [9} where the mean and variance of the functions po 
and u in posterior distribution are visualised. 
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Figure 9. Plots of the mean values (solid lines) of (a) po and v occurring in the posterior 
distribution using the data plotted in The dashed lines denote ±3 standard deviations 

(approximately 99.7% confidence interval) 




Figure 10. Marginal distribution of observational noise variance in the MCMC chain, using data 
plotted The two different variances used when creating the data are also shown. 

Figure shows the marginal distributions on the observational noise variance as estimated by 
the MCMC methods. The non-uniformity of the noise used when creating the data cannot be 
recovered in this setting since we assume uniformity over the whole shape in our prior on the 
observational noise variance. Therefore, the recovered values are somewhere between the poor and 
good SNR regimes. Note that in comparison with the distributions in Figure [2| the variances in 
these distributions are quite similar, despite having a great deal more observations. With a region 
with a much poorer SNR, the uncertainty in the system is much greater, and as such there is still 
a large region of the state space with similar probability density. The approach we have used in 
this paper allows us to quantify this uncertainty as well as finding states of maximal likelihood. 
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8. Conclusions 

This paper presents a Bayesian approach to shape registration which takes into account the fact 
that observations of a shape are not exact, and gives a distribution on the shortest distance in 
shape space between the template shape to the observed shape. We concentrate on the case of a 
finite number of observations of points from the target curve, and our approach makes use of the 
use of explicit reparameterisation variable as described in [TTl |20] . The sampling method is framed 
on function space and so is robust under refinements of discretisation [TB]. 

By careful analysis of the forward problem, we have been able to formulate a well-posed 
Bayesian inverse problem regarding shape matching of a curve with a set of noisy observations 
of another. We have shown that the likelihood function is Lipschitz continuous on a space which 
has full measure with respect to a specified choice of Gaussian prior measure. Using this, we 
have shown how to draw samples from well-defined posterior distributions using the pCN MCMC 
sampler on function space. This choice of algorithm prevents slower convergence of the statistical 
algorithm as the discretisation of the observation operator is refined. We have then implemented 
this algorithm, and presented briefly some illustrative numerics. 

In these numerics, we first showed that the posterior distribution shows consistency as the 
number of observations of a recoverable shape are increased, and the posterior distributions on the 
functions of interest become increasingly peaked on the functions which were used to create the 
data. We also showed an example, where the lengthscale of features in the data is small, in which 
the posterior distributions exhibit multimodal behaviour, indicating that full characterisation of 
the posterior can give us more information than the solution of an equivalent optimisation problem. 
Since the shape space is a Riemannian manifold with regions of positive and negative curvature, 
there may be more than one geodesic between two points in shape space, which may also lead to 
multimodality. 

Finally we showed that this method can give us a range of different possible solutions in the 
case of lost or destroyed data for some part of the target shape. 

Since we already have at our disposal an implementation of the adjoint problem so that we 
can calculate the gradient of the observation operator (which we use in the BFGS method for 
the deterministic burn-in), we can very simply adapt the MCMC method from random walk to 
a gradient method, such as the Metropolis Adjusted Langevin Algorithm (MALA). This would 
increase the efficiency of the algorithm markedly. Further analytical results would be needed, 
however, to ensure that the gradient of the observation operator is continuous on a space which 
has full measure with respect to an appropriately chosen prior distribution. 

Another extension of this work would be to also make the problem translation, rotation and 
scale invariant, so that any misalignment of the imaging equipment has a negligible effect on 
the results. This would involve adding parameters into the state space to allow for these types 
of operations. One might also consider the case where the template shapes themselves are only 
noisily observed, so that we are trying to find a distribution on the length of geodesic paths in 
shape space between two noisily observed shapes. The problem could also be extended to the case 
in which the ordering of the observed points around the curve is not known, in which case the 
discrete ordering would also have to be learned as part of the inverse problem. This could be done 
by combining the present approach with a Gibb's sampler for the ordering. This would result in 
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a Bayesian approach to the segmentation process, in which points on the boundary between two 
materials might first be estimated from a bitmapped image but their ordering is not known with 
100% confidence. The result would be a probability distribution on the boundary curve with some 
prescribed topology. 
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